Czech-to-slovak adapted broadcast news transcription system

نویسندگان

  • Jan Nouza
  • Jan Silovský
  • Jindrich Zdánský
  • Petr Cerva
  • Martin Kroul
  • Josef Chaloupka
چکیده

The first broadcast news (BN) transcription system for Slovak is introduced. It employs the same modules as the system we developed earlier for Czech. We utilize similarity between the two languages in efficient lexicon building, in mapping Slovak specific (rarely occurring) phonemes onto Czech ones and in low-resource cross-lingual adaptation of acoustic model. The system uses 166K-word lexicon and on the Slovak part of European COST278 BN database achieves 23.6% WER (which is only 5% less than the original, longterm optimized Czech system). Similar results were achieved also on recently recorded data from four Slovak stations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation

This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...

متن کامل

Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon

We present a system developed for fully automated processing of Czech spoken broadcast programs. It includes modules for unsupervised segmentation of audio stream, speaker and gender recognition followed by speaker adaptation, and own speech decoder designed for extremely large vocabularies. Compared to our previous results reported in 2004, the new system reduced the WER (evaluated on the Czec...

متن کامل

Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System

This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and ...

متن کامل

Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs

This paper describes the first speech recognition system capable of transcribing a wide range of spoken broadcast programs in Czech language with the OOV rate being below 3 per cent. To achieve that level we had to a) create an optimized 200k word vocabulary with multiple text and pronunciation forms, b) extract an appropriate language model from a 300M word text corpus and c) develop an own de...

متن کامل

Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources

We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: a) an initial bootstrapping phase th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008